Genomic variation induced by a low concentration of ethyl methanesulfonate (EMS) in quinoa ‘Longli-4’ variety

Quinoa (Chenopodium quinoa, 2n = 4x = 36), a super pseudocereal crop, has been introduced into China nearly 60 years. Many excellent varieties have been developed through massive selection; however, few are developed through mutagenesis breeding. In this study, the ‘Longli-4’ variety, locally cultivated in Gansu province, Northwest China, was selected for experimentation. The grains of ‘Longli-4’ were treated with ethyl methanesulfonate (EMS) at a concentration of 0.8% for 8 h. Nine plants from independent M2 families were randomly selected to investigate the mutagenesis effect of EMS on the quinoa genome. The results indicated that the single nucleotide polymorphisms (SNPs) induced by EMS were unevenly distributed across all 18 chromosomes, with an average mutation frequency of 91.2 SNPs/Mb, ranging from 4.5 to 203.5 SNPs/Mb. A significant positive correlation between the number of SNPs and chromosome length was identified through linear model analysis. Transitions from G/C to A/T were the most predominated in all variant categories, accounting for 34.4–67.2% of the mutations, and SNPs were significantly enriched in intergenic regions, representing 69.2–75.1% of the total mutations. This study provides empirical support for the application of low concentration EMS treatment in quinoa breeding. Supplementary Information The online version contains supplementary material available at 10.1186/s40529-024-00427-x.


Introduction
The pseudocereal crop quinoa (Chenopodium quinoa), an allotetraploid species (2n = 4x = 36) (Jarvis et al. 2017), has been consumed for more than 7000 years (Bazile et al. 2013;Gomez-Pando et al. 2019), and the formal breeding of quinoa was initiated in the 1960s (McElhinny et al. 2007).More than 16,000 quinoa accessions have been reported at a global level (Rojas et al. 2013).Normally, these accessions can be categorized into five ecotypes based on the differences in their growth environments, including Sea level, Valley, Altiplano, Salt flat, and Yungas (Gomez-Pando et al. 2019).
Quinoa has been cultivated in China for several decades, leading to the rapid development of diverse varieties through years of selective breeding (Lin et al. 2019).The area devoted to quinoa cultivation has significantly increased from 1,300 hectares in 2014 to 16,600 hectares in 2019, with total yields rising from 2.1 × 10 3 tons to approximately 28.8 × 10 3 tons (Cui et al. 2024).More than 20 quinoa varieties have been cultivated crossing over 20 Chinese provinces (regions), each tailed to the unique environment conditions (Cui et al. 2024).The variety known as Longli-4, for instance, is characterized by its white grains, short growth cycle, and high productivity yield, sharing a close relationship with the reference variety PI614886 (Jarvis et al. 2017;Li et al. 2022).Longli-4 has shown adaptability to grow in regions with high-altitudes, arid climates, and a range of cold to cool temperatures (Huang et al. 2020).However, conventional breeding approaches, such as mass-and individual-selection, and cross-breeding typically require several years or even decades to develop a superior variety (Dong et al. 2020).Moreover, ensuring genetic stability across generations is challenging due to quinoa's high rate of outcrossing (Li et al. 2022).Thus, there is an urgent need to utilize alternative technical methods to breed new quinoa varieties of higher quality.
Ethyl methanesulfonate (EMS) is a mutagenic agent known for its high efficiency, excellent mutagenic effects, relatively simple operation, and low production costs.It has been extensively applied in crop improvement programs for a variety of species, including Triticum aestivum (Mishra et al. 2016), Oryza sativa (Mohapatra et al. 2014;Shoba et al. 2017), Zea mays (Nie et al. 2021), Brassica napus (Harloff et al. 2012;Tang et al. 2020), and Solanum lycopersicum (Garcia et al. 2016).EMS induces an abundance of point mutations, predominantly causing G to A and C to T transition (Greene et al. 2003;Wang et al. 2023).With the publication of the quinoa genome (Jarvis et al. 2017;Zou et al. 2017;Mangelson et al. 2019), efforts to innovate quinoa germplasm have been gradually gaining momentum.In this context, research related to EMS mutagenesis on quinoa has been conducted.Wang et al. (2021) and Wen et al. (2021) have carried out preliminary investigations into the optimal conditions for EMS mutagenesis on Jingle (Shanxi) and Xinli No.1 quinoa varieties, respectively, determining that 1.5% EMS for 12 h and 1.8% EMS for 3.3 h produced the best results.
An epidermal bladder cell-free (ebcf) mutant of quinoa was isolated by treating quinoa grains of cultivar 5206 (Titicaca) with 0.2% EMS for 16 h at room temperature (Moog et al. 2022).A previous study found that the REBC gene, encoding a WD40 protein, is involved in the formation of quinoa epidermal bladder cells (EBCs) (Imamura et al. 2020).The rebc mutant generated by EMS mutagenesis in the CQ127 variety of quinoa exhibited reduced epidermal bladder cells and a significant decrease in tolerance to abiotic stress (Imamura et al. 2020).In 2018, this group identified two green hypocotyls mutants (ghy1 and ghy2); the candidate genes for both were CYP76AD1-1 (Imamura et al. 2018).Interestingly, all of these mutants have point mutations that were G to A or C to T transitions, resulting in the formation of non-synonymous substitutions or stop codons (Imamura et al. 2018(Imamura et al. , 2020;;Moog et al. 2022).Cox (2020) constructed a mutant library with 5,030 families by treating the QQ74 variety with EMS, achieving a mutation rate of 21.7 SNPs/Mb.Subsequently, Parker (2022) performed a preliminary analysis of the candidate genes for some of these mutant families with a reduced height phenotype.In one family, the candidate gene (GAI1) related to the short phenotype was identified, which formed missense or stop-gained mutations caused by G/C to A/T transitions (Parker 2022).In these studies, G/C to A/T transitions were the primary types of point mutations induced by EMS, and the variant positions were predominantly located in the intergenic regions (Cox 2020;Parker 2022).This phenomenon was also observed by Mestanza et al. (2019) with one mutation per 203 Kb.Although EMS mutagenesis of quinoa was conducted, the characteristics of SNPs generated by lower concentrations of EMS on each chromosome and the relationship between the number of SNPs and the chromosome length were relatively under reported in previous studies.
Longli-4, a widely cultivated quinoa variety in China with beneficial traits developed through years of breeding (Huang et al. 2020), has a genomic makeup closely related to the reference variety PI614886 (Li et al. 2022).This relatedness minimizes genomic background variation, facilitating the detection of the SNPs induced by EMS treatment.In this study, a lower concentration of EMS (0.8%) was utilized to create a mutant library for Longli-4.Nine mutants, exhibiting no significant phenotypic variations, were selected for further analysis.Whole-genome resequencing was conducted to explore the characteristics of the EMS-induced SNPs.We also characterized the chromosomal distribution patterns of SNPs with non-synonymous and stop-gain mutations.Our findings lay the groundwork for future mutagenesis breeding strategies and the construction of comprehensive mutant libraries in quinoa.

Plant materials and genome resequencing
Approximately 400 g of Longli-4 variety grains (M 0 ) were treated with 0.8% EMS for 8 h according to protocol provided by the Plant Seed EMS mutagenesis kit (GenMED Scientifics Inc. U.S.A).The resulting M 1 seeds were then sown in the experimental field in Haiyuan, Ningxia province, in 2019, employed the method delineated by Li et al. (2021).Upon reaching full maturity, M 1 plants were randomly selected, from which 100 individual plants were used to harvest M 2 seeds separately, while the seeds from the remaining plants were collected in bulk.In May 2020, approximately M 2 15-20 seeds from each of the 100 M 1 individuals were sown in pots (20 cm × 30 cm), which were filled with nutrient-rich soil.At the same time, seeds of the unmutated Longli-4 variety were sown as the control specimens.Two months post-planting, young leaves were sampled from each plant for sequencing purposes.The mutant plants were labeled R2-R10, corresponding to individual plants #102, #118, #120, #143, #15, #20, #69, #71, and #84, while five Longli-4 control plants were labeled R11-R15.These samples were flash-frozen in liquid nitrogen and then sent to Biomarker Technologies (Beijing, China) for genomic analyses.Sequencing libraries were constructed following the standard Illumina protocol, and paired-end sequencing was performed on the Illumina platform.It is important note that an additional CA3-1 variety, initially labeled as R1, was also included in the sequencing process.However, data from R1 were deleted from the analysis presented in this study, with the focus being solely on the 14 remaining individuals (R2-R15).

SNP calling and statistical analyses
Based on the MEM algorithm of Burrows-Wheeler Aligner (Li and Durbin 2009) (BWA 0.7.10-r789), the clean reads obtained after filtering were aligned back to the reference genome (PI614886) (Jarvis et al. 2017).A total of 200.7 Gb clean reads were acquired, with a Q30 value of 93.5%.The average mapping rate was 99.4%, of which the properly mapped rate was 96.6%, and the average sequencing depth was 9X.Redundant reads were filtered by Picard (available at http://sourceforge.net/ projects/picard/), followed by single nucleotide polymorphism (SNP) calling using the Genome Analysis Toolkit (GATK version 3.8) with the default setting (McKenna et al. 2010).
To characterize the variation of each mutant, the SNPs were further filtered as follows: (1) to ensure the genetic background consistency of the wild-type (WT) control plants, SNPs caused by individual differences were excluded to obtain SNPs shared among the five WTs; (2) to account for the genotype difference between the Longli-4 WT and the reference genomic line (PI614886), SNP variations resulting from genotype divergence were deleted to obtain the SNPs common to the WT and the reference genome; (3) to identify SNPs generated by EMS mutagenesis, SNPs shared among the nine mutants were removed; (4) this study mainly focuses on chromosomal variants; thus SNPs that could not be mounted into the 18 chromosomes were discarded; (5) it was hypothesized that each mutation occurred independently, so the loci involved in the variation were specific to each individual, and therefore the SNPs obtained were unique to at least one of the nine mutants, such as R2=(A, A), R3 = R4=… =R10=(C, C); or R2=(A, A), R3=(A, C), R4 = R5=…=R10.
The above steps ensure that the screened SNPs were generated by EMS, forming the basis for subsequent analyses.
For each mutant, a comparative analysis with WT was performed separately using RStudio (version R4.3.0) to remove the SNPs shared by mutants and WT, as well as those identified as "N".The total number of SNPs and those identified as non-synonymous mutations were characterized for their distribution on each chromosome using the CMplot R package (Yin 2022).The number of total, homozygous, non-synonymous, and homozygous and non-synonymous mutations was calculated for each mutant using do (Zhang and Jin 2021) and data.table(Dowle and Srinivasan 2021) R packages to investigate the distribution frequency of SNPs induced by EMS on the chromosomes.The base variation pattern of the SNPs was analyzed to confirm if the EMS-induced mutations exhibited bias.The number and relative abundance of SNPs were statistically measured based on variation types and genomic positions using do (Zhang and Jin 2021) and data.table(Dowle and Srinivasan 2021) R packages.Stacked bar charts were drawn to visualize these statistics.Furthermore, the relationship between total SNPs, non-synonymous mutations, stop-gain variations, and chromosome length was evaluated utilizing linear models.The dendrograms of hierarchical clustering were generated using ggplot2 R package (Wickham 2016) and tanglegram function of dendextend R package (Galili 2015).Additionally, GO enrichment analysis was performed for genes containing non-synonymous mutations and stop-gain variants of the SNPs using clusterProfiler R package (Wu et al. 2021), with the aid of annotated information from SnpEff (Cingolani et al. 2012).
To estimate the mutagenesis effects of EMS at a low concentration (0.8%), we analyzed the frequency of the SNP occurrence individually for each mutant.Table 1 showed that the SNP frequencies vary significantly among mutants and also among different chromosomes within the same mutant.For the nine mutants examined, the lowest SNP frequency was found in mutant R7, with an average of 4.5 SNPs/Mb (range from 1.9 to 9.8 SNPs/Mb), while the highest was in mutant R2, with 203.5 SNP/Mb (range from 65.3 to 366.6 SNPs/Mb) (Table 1).Linear models and dendrograms topology comparison were constructed to explore potential relationships between the total number of SNPs, non-synonymous mutations, stop-gain mutations, and chromosome lengths for each mutant (as shown in Figs. 3 and 4, and Fig. S1-S9).Strong correlations between the total numbers of SNPs and the chromosome lengths were observed in eight out of nine mutants, with R 2 ranging from 0.20 to 0.78 (p values ranging from 6.3e-07 to 0.035; Fig. 3d; Fig. S1-S9).The strongest correlation was identified in mutant R8 (R 2 = 0.78, p = 6.3e-07;Fig. 3d), while no significant correlation was detected in mutant R3 (R 2 = 0.10, p = 0.11; Fig. 3a).The dendrogram topology comparison showed a similar pattern to the correlation analyses, with entanglement values for mutants R2 and R4-R10 ranging from 0.05 to 0.42.However, mutant R3 had a higher entanglement value of 0.48 (Fig. 4a and c; Fig. S1 and S3-S9).Furthermore, strong correlations between the number of non-synonymous mutations and chromosome length were detected in all mutants except R3, with p-values ranging from 1.7e-05 to 0.018 (Fig. 3b and e; Fig. S1-S9).As for stop-gain variants' correlation with chromosome length, strong correlations were found only   Typically, SNPs are categorized as either transitions (purine-to-purine [G/A] or pyrimidine-to-pyrimidine transitions [C/T]) or transversions.Transitions are more common than transversions.To assess EMS-induced SNP preferences in quinoa, we independently analyzed the base variation of SNPs for each mutant and chromosome.Most SNPs were transitions, accounting for 59.1%, 60.7%, 60.0%, 59.7%, 60.8%, 65.4%, 78.9%, 62.0%, and 68.3% of the total variations across the nine mutants, respectively, a proportion significantly higher than that of the transversions (Fig. 5a; Fig. S10).Within the transitions, the frequency of C/G to T/A was higher than that of T/A to C/G in all nine mutants (ranging from 34.4 to 67.2% vs. 11.7 to 26.3%).For transversions, the relative percentages of C/G to A/T, T/A to A/T, and T/A to G/C were quite similar across the board, whereas the C/G to G/C transformation represented the lowest proportion of total SNPs in all mutants (Fig. 5a; Fig. S10).According to the SNP annotations using snpEFF, most SNPs were located in the intergenic regions (ranging from 69.2 to 75.1%), while 16.0-20.2%SNPs were found in upstream and downstream regions (Fig. 5b; Fig. S11).Detailed analysis revealed that only 1.3-2.2%SNPs resulted in non-synonymous mutations across the nine mutants (Fig. 5b; Fig. S11).

GO enrichment of the non-synonymous and stop-gain SNPs
Non-synonymous mutations and the introduction of premature stop-codon typically result in phenotypic variation.The number of genes affected by non-synonymous mutations and stop-gain SNPs was determined for each mutant (refer to Tables 2 and 3).The average number of genes with non-synonymous mutations spanned from 10.6 to 144.4 across 18 chromosomes in the nine mutants (Table 2), while the number of genes with the stop-gain SNPs was notably lower, ranging from 1.0 to 6.3 (Table 3).These stop-gain SNPs accounted for only 0.03-0.09% of total variation observed in all mutants.Gene Ontology (GO) enrichment analysis was employed to identify the potential functions of the genes affected by these mutations.For the R2 mutant, two GO terms were significantly enriched: ' ADP binding' (GO:0043531, p = 2.48e-06) and 'oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen' (GO:0016705, p = 0.0002 (Fig. 6a).The mutants R3, R4, R5, R9, and R10 each had a single GO term significantly enriched, specifically ' ADP binding' (GO:0043531, p-values ranging from4.11e-05 to 5.88e-05), 'sulfotransferase activity' (GO:0008146, p = 0.0002), and 'phosphotransferase activity, alcohol group as acceptor' (GO:0016773, p = 6.21e-05) (Fig. 6b-d, f and g).For the R8 mutant, two GO terms were enriched: 'phosphatidylinositol binding' (GO:0035091, p = 0.0004), and 'COPII vesicle coat' (GO:0030127, p = 0.0007 (Fig. 6e).No GO terms were significantly enriched in the R6 and R7 mutants.

Discussion
EMS mutagenesis efficiency is influenced by the concentration of the mutagen, the duration of treatment, and seed size, therefore optimal conditions for mutagenesis vary among different varieties (Chen et al., 2023).Previous studies have identified the most suitable concentrations and times for EMS mutagenesis in various quinoa varieties: 1.5% for 12 h in Jingle (Wang et al. 2021), 1.8% for 3.3 h in Xin NO.1 (Wen et al. 2021), 2.0% or 2.5% for 6 h in QQ74 (Cox 2020), and 2.0% for 8 h in Regalona-Baer (Mestanza et al. 2019).While these higher concentrations are advantageous for constructing mutant libraries, they may not be ideal for quinoa breeding due to the resulting severe phenotypic variation (Cox 2020).In comparison to high doses mutagenesis, lowerdose may reduce the number of deleterious mutations and decrease the toxicity to the seeds and/or seedlings, thereby increasing the potential for successful mutations and improving the survival rate, as well as possibly enhancing the resistance of the plant to adverse environments (Hu et al. 2017;Shen 2018;Wang 2018;Gillmor and Lukowitz 2020;Yan 2023).In the current study, Longli-4 variety seeds were treated with a significantly lower concentration of EMS at 0.8%, which did not cause obvious phenotypic variation in M 1 and M 2 plants.However, the average mutation rate was 91.2 SNPs/Mb (ranging from 4.5 to 203.5 SNPs/Mb; Table 1), surpassing the rates observed in QQ74 (21.8 SNPs/Mb) (Cox 2020) and Regalona-Baer (4.93 SNPs/Mb) (Mestanza et al. 2019) varieties of quinoa.These findings suggest that lower   EMS concentrations can induce substantial genetic variation while inflicting minimal genomic damage, thereby maintaining the plants' phenotype.In line with previous report that low concentrations of EMS mutagenesis results in subtle phenotypic changes (Hu et al. 2017), but its application offers a high potential to increase the advantageous variation and generate alternative combinations of genotypes, which is of great importance for selective breeding of quinoa to adapt to different marginal environments.
Genome resequencing has been applied to investigate the mutations induced by mutagens in various crops, including SNPs and InDels (Shirasawa et al. 2016;Xiao et al. 2019;Tang et al. 2020).Typically, EMS induces G/C to A/T transition (Greene et al. 2003), and this type of variation was the most predominant among the nine mutants in this study, accounting for 34.4-67.2% of the total variation (Fig. 5).This observation consists with findings from other crops like Triticum aestivum (Wang et al. 2023), Oryza sativa (Yan et al. 2021), Brassica rapa ssp.pekinensis (Sun et al. 2022), and Brassica napus (Tang et al. 2020).Moreover, the SNPs induced by EMS mutagenesis are predominately located in intergenic regions, with 38.39% in B. napus (Tang et al. 2020), 53.9% in B. rapa ssp.pekinensis (Sun et al. 2022), and 22% in Sorghum bicolor (Jiao et al. 2016) being identified as such.In this study, 69.2-75.1% of SNPs were found in the intergenic regions (Fig. 5).It is important to note that sequence and codon preferences for EMS mutagenesis has been reported in Oryza sativa (Henry et al. 2014;Yan et al. 2021), indicating that similar preferences might also existed in quinoa.

Fig. 1
Fig. 1 The distribution patterns of SNPs on 18 chromosomes in nine mutants (R2-R10) in comparison with the Longli-4 wild type plants (a-i).The chromosomes of the A sub-genome are shown in red, and those of the B sub-genome are shown in pale blue

Fig. 2
Fig. 2 The distribution patterns of non-synonymous variations on 18 chromosomes in nine mutants (R2-R10) in comparison with the Longli-4 control plants (a-i).The chromosomes of the A sub-genome are shown in red, and those of theB sub-genome are shown in pale blue

Fig. 3
Fig. 3 Linear regression analyses of the chromosome length and SNPs number in R3 and R8.Linear regression diagram of the chromosome length against total SNPs, non-synonymous, and stop-gain mutations for R3 (a-c) and R8 (d-e)

Fig. 4
Fig. 4 Dendrogram topology comparison of chromosomes length and total SNPs number, and comparison of total SNPs and non-synonymous mutations for R3 (a, b) and R8 (c, d).The connection lines are in red to highlight two sub-trees that are present in both dendrograms

Fig. 5
Fig. 5 Stacked bar charts of the total variations in R2-R10 individuals.(a) SNPs are categorized into six groups based on transitions and transversions, and the percentage of each group in R2-R10 mutants is presented as a stacked bar.(b) SNPs are categorized into seven groups based on annotation, and relative abundance values of seven groups in R2-R10 are shown as a stacked bar

Table 1
The SNP mutation frequency on each chromosome (SNP/Mb)

Table 2
The number of genes with non-synonymous SNPs

Table 3
The number of genes with stop-gain SNPs